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Abstract: In this paper, we present a formal quantification of epistemic 
uncertainty induced by numerical solutions of ordinary and partial differen¬ 
tial equation models. Numerical solutions of differential equations contain 
inherent uncertainties due to the finite dimensional approximation of an un¬ 
known and implicitly defined function. When statistically analysing models 
based on differential equations describing physical, or other naturally oc¬ 
curring, phenomena, it is therefore important to explicitly account for the 
uncertainty introduced by the numerical method. This enables objective de¬ 
termination of its importance relative to other uncertainties, such as those 
caused by data contaminated with noise or model error induced by missing 
physical or inadequate descriptors. To this end we show that a wide variety 
of existing solvers can be randomised, inducing a probability measure over 
the solutions of such differential equations. These measures exhibit contrac¬ 
tion to a Dirac measure around the true unknown solution, where the rates 
of convergence are consistent with the underlying deterministic numeri¬ 
cal method. Ordinary differential equations and elliptic partial differential 
equations are used to illustrate the approach to quantifying uncertainty in 
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both the statistical analysis of the forward and inverse problems. 
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1. Introduction and Motivation 

The numerical analysis literature has developed a large range of efficient algo¬ 
rithms for solving ordinary and partial differential equations, which are typically 
designed to solve a single problem as efficiently as possible [12, 8]. When classical 
numerical methods are placed within statistical analysis, however, we argue that 
significant difficulties can arise. Statistical models are commonly constructed by 
naively replacing the analytical solution with a numerical method; these models 
are then typically analysed by computational methods that sample the solver 
with many slightly varying configurations. This is problematic, not only be¬ 
cause the numerical solutions are merely approximations, but also because the 
assumptions made by the numerical method induce correlations across multiple 
numerical solutions. While the distributions of interest commonly do converge 
asymptotically as the solver mesh becomes dense (e.g., in statistical inverse 
problems [6]), we argue that at a finite resolution the analysis may be vastly 
overconfident as a result of these unmodelled errors. 

The purpose of this paper is to address these issues by the construction and 
rigorous analysis of novel probabilistic integration methods for both ordinary 
and partial differential equations. The approach in both cases is similar: we 
identify the key discretisation assumptions and introduce a local random field, 
in particular a Gaussian held, to rehect our uncertainty in those assumptions. 
The probabilistic solver may then be sampled repeatedly to interrogate the 
uncertainty in the solution. For a wide variety of commonly used numerical 
methods, our construction is straightforward to apply and preserves the formal 
order of convergence of the original method. 

Furthermore, we demonstrate the value of these probabilistic solvers in sta¬ 
tistical inference settings. Analytic and numerical examples show that using a 
classical non-probabilistic solver with inadequate discretisation when perform¬ 
ing inference can lead to inappropriate and misleading posterior concentration 
in a Bayesian setting. In contrast, the probabilistic solver reveals the struc¬ 
ture of uncertainty in the solution, naturally limiting posterior concentration as 
appropriate. 

Our strategy hts within the emerging held known as Probabilistic Numerics 
[14], a perspective on computational methods pioneered by Diaconis [7], and 
subsequently Skilling [23]. Under this framework, solving differential equations 
is recast as the statistical inference problem of hnding the latent, unobserved 
function that solves the differential equation, based on observable constraints. 
The output from such an inference problem is a probability measure over func¬ 
tions that satisfy the constraints imposed by the specihc diherential equation. 
This measure formally quantihes the uncertainty in candidate solution(s) of the 
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differential equation and allows us to coherently propagate this uncertainty when 
carrying out statistical analyses, for example, in uncertainty quantification [25] 
or Bayesian inverse problems [6]. 

1.1. Review of existing work 

Earlier work in the numerical analysis literature including randomisation in 
the approximate integration of ordinary differential equations (ODEs) includes 
[5, 24]. However, the connections between probability and numerical analysis 
were elucidated in the foundational paper [7], which explains this connection 
in the context of Bayesian statistics and sets out an agenda for Probabilistic 
Numerics. It was in [23] that a concrete demonstration of how numerical in¬ 
tegration of ordinary differential equations could be interpreted as a Bayesian 
inference method, and this idea was converted into an operational methodology, 
with consistency proofs given in [4]. 

The methodology detailed in [23, 4] (explored in parallel in [13]) builds a 
probabilistic structure which contains two sources of randomness: an overar¬ 
ching Gaussian process structure and an internal randomisation of the mean 
process, resulting in different draws of the mean for each realisation. However, 
careful analysis reveals that, because of the conjugacy inherent in linear Gaus¬ 
sian models, the posterior variance of the outer Gaussian process is independent 
of the observed constraints, and thus does not reveal any information about the 
uncertainty in the particular problem being solved. The randomisation of the 
mean process is more useful, and is related to the strategy we present here. 
Those initial efforts were limited to first order Euler schemes [12], and were 
later extended to Runge-Kutta methods [22], although the incomplete charac¬ 
terisation of uncertainty remains. The existing literature has focussed only on 
ordinary differential equations and a formal probabilistic construction for par¬ 
tial differential equations (PDEs) that is generally applicable remains an open 
issue. All of these open issues are resolved by this work. Unlike [23, 4, 22], which 
are exclusively Bayesian by construction, our probabilistic solvers return a prob¬ 
ability measure which can then be employed in either frequentist or Bayesian 
inference frameworks. 

Our motivation for enhancing inference problems with models of discretisa¬ 
tion error is similar to the more general concept of model error, as developed by 
[18]. However, our focus on errors arising from the discretisation of differential 
equations leads to more specialised methods. Existing strategies for discretisa¬ 
tion error include empirically fitted Gaussian models for PDE errors [17] and 
randomly perturbed ODEs [2]; the latter partially coincides with our construc¬ 
tion, but our motivation and analysis is distinct. Recent work [3] uses Bayes 
factors to analyse the impact of discretisation error on posterior approximation 
quality. Probabilistic models have also been used to study error propagation due 
to rounding error; see [11]. 

The remainder of the paper has the following structure: Section 2 introduces 
and formally analyses the proposed probabilistic solvers for ODEs. Section 3 
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explores the characteristics of random solvers employed in the statistical analysis 
of both forward and inverse problems. Then, we turn to elliptic PDEs in Section 
4, where several key steps of the construction of probabilistic solvers and their 
analysis have intuitive analogues in the ODE context. Finally, an illustrative 
example of PDE inference problem is presented in Section 5.^ 

2. Probability Measures via Probabilistic Time Integrators 

Consider the following ordinary differential equation (ODE): 

^ = f{u), u{0) = uo, (2.1) 

where u{-) is a continuous function taking values in We let denote the 
flow-map for Equation (2.1), so that u{t) = <i>t(M(0)). The conditions ensuring 
that this solution exists will be formalised in Assumption 2, below. 

Deterministic numerical methods for the integration of this equation on time- 
interval [0, T] will produce an approximation to the equation on a mesh of points 
{tk = with Kh = T, (for simplicity we assume a fixed mesh). Let 

Uk = u{tk) denote the exact solution of (2.1) on the mesh and Uk « Uk denote 
the approximation computed by the numerical method, based on evaluating / 
(and possibly higher derivatives) at a finite set of points that are generated 
during numerical integration. Typically these methods output a single discrete 
solution {Uk}k^Q, possibly augmented with some type of error indicator, but do 
not statistically quantify the uncertainty remaining in the path. 

Let Xa,b denote the Banach space (^([a, 6]; K”). The exact solution of (2.1) 
on the time-interval [0, T] may be viewed as a Dirac measure Su on Xq^t at the 
element u that solves the ODE. We will construct a probability measure ft^ on 
Xq^t, that is straightforward to sample from both on and off the mesh, for which 
h quantifies the size of the discretisation step employed, and whose distribution 
reflects the uncertainty resulting from the solution of the ODE. Convergence of 
the numerical method is then related to the contraction of to (5„. 

We briefly summarise the construction of the numerical method. Let : 
K" —>■ M" denote a classical deterministic one-step numerical integrator over 
time-step h, a class including all Runge-Kutta methods and Taylor methods for 
ODE numerical integration [12]. Our numerical methods will have the property 
that, on the mesh, they take the form 

Uk+i = -^hiUk)+Uh), ( 2 . 2 ) 

where ^fe(h) are suitably scaled, i.i.d. Gaussian random variables. That is, the 
random solution iteratively takes the standard step, followed by perturba¬ 
tion with a random draw, ^/c(h), modelling uncertainty that accumulates be¬ 
tween mesh points. The discrete path {C/fc}^Q is straightforward to sample and 

^Supplementary materials and code are available online: http://www2.warwick.ac.uk/ 
fac/sci/statistics/staff/academic-research/girolami/probints/ 

^To simplify our discussion we assume that the ODE is autonomous, that is, f{u) is 
independent of time. Analogous theory can be developed for time dependent forcing. 
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in general is not a Gaussian process. Furthermore, the discrete trajectory can 
be extended into a continuous time approximation of the ODE, which we define 
as a draw from the measure 

The remainder of this section develops these solvers in detail and proves 
strong convergence of the random solutions to the exact solution, implying that 
ft^ —>■ in an appropriate sense. Finally, we establish a close relationship 

between our random solver and a stochastic differential equation (SDE) with 
small mesh-dependent noise. 


2.1. Probabilistic time-integrators: general formulation 


The integral form of Equation (2.1) is 

u{t) = Uo-^ / f{u{s))ds. 

Jo 

The solutions on the mesh satisfy 


(2.3) 


r^k+1 


Uk+i=Uk+ / f{u{s))ds, 


(2.4) 


and may be interpolated between mesh points by means of the expression 


(t) = Uk + [ f{u{s))ds, te[tk,tk+i). (2.5) 

Jtk 


We may then write 


u{t) = Uk+ g{s)ds, te[tk,tk+i), 

Jtk 


( 2 . 6 ) 


where g{s) = f{u{s)') is an unknown function of time. In the algorithmic setting 
we have approximate knowledge about g{s) through an underlying numerical 
method. A variety of traditional numerical algorithms may be derived based on 
approximation of g{s) by various simple deterministic functions g^{s). Perhaps 
the simplest such numerical method arises from invoking the Euler approxima¬ 
tion that 

g^{s) = f(Uk), sG[tk,tk+i). (2.7) 

In particular, if we take t = t^+i and apply this method inductively the cor¬ 
responding numerical scheme arising from making such an approximation to 
g{s) in (2.6) is Uk+i = Uk + hf{Uk). Now consider the more general one-step 
numerical method Uk+i = '^hiUk)- This may be derived by approximating g(s) 
in (2.6) by 

g'^is) =-^(^riUk)) , se[tk,tk+i). (2.8) 

dr \ / r=s-tk 
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We note that all consistent (in the sense of numerical analysis) one-step methods 
will satisfy 

The approach based on the approximation (2.8) leads to a deterministic nu¬ 
merical method which is defined as a continuous function of time. Specifically 
we have U{s) = s-tkiUk), s G [tk,tk+i)- Consider again the Euler approxi¬ 
mation, for which 'l'r(C^) = U + Tf{U), and whose continuous time interpolant 
is then given by U{s) = Uk + {s — tk)f{Uk), s G [tk,tk+i). Note that this 
produces a continuous function, namely an element of Xq^t, when extended to 
s G [0,r]. The preceding development of a numerical integrator does not ac¬ 
knowledge the uncertainty that arises from lack of knowledge about g(s) in the 
interval s G [tfe,tfe+i). We propose to approximate g stochastically in order to 
represent this uncertainty, taking 

^ +Xk{s-tk), SG[tk,tk+l) 

where the {xk} form an i.i.d. sequence of Gaussian random functions defined 
on [0,/i] with Xk N{0,C^)^. 

We will choose to shrink to zero with h at a prescribed rate, and also to 
ensure that Xk G Xo^h almost surely. The functions {xk} represent our uncer¬ 
tainty about the function g. The corresponding numerical scheme arising from 
such an approximation is given by 

Uk+i = 'i>h{Uk)+Uh) (2.9) 

where the i.i.d. sequence of functions {^fc} lies in Xg^h and is given by 

^kit) = [ Xfe(r)dr. (2.10) 

Jo 

Note that the numerical solution is now naturally defined between grid points, 
via the expression 

U{s) = '^s-tu:{Uk) + ^k{s-tk), SG[tk,tk+l). (2.11) 

In the case of the Euler method, for example, we have 

Uk+i = Uk + hf{Uk)+Uh) ( 2 . 12 ) 

and, between grid points, 

U{s) = Uk + (s -tk)f{Uk) + ikis-tk), sG[tk,tk+i). (2.13) 

This method is illustrated in Figure 1. 

^We use Xk ~ N{0,C^) to denote a zero-mean Gaussian process defined on [0, h] with a 
covariance kernel cov{xkWiXk{^)) — C^{t,s). 
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(a) Deterministic Euler. (b) Randomized Euler. 


Fig 1: An illustration of deterministic Euler steps and randomised variations. 
The random integrator in (b) outputs the path in red; we overlay the standard 
Euler step constructed at each step, before it is perturbed (blue). 


While we argue that the choice of modelling local uncertainty in the flow-map 
as a Gaussian process is natural and analytically favourable, it is not unique. 
It is possible to construct examples where the Gaussian assumption is invalid; 
for example, when a highly inadequate time step is used, a systemic bias may 
be introduced. However, in regimes where the underlying deterministic method 
performs well, the centred Gaussian assumption is a reasonable prior. 

Finally, we comment on sampling off-grid. Imagine that we wish to sample the 
continuous time solution at point s € {tk, tk+i)- This might be performed in two 
different ways, given the solution generated on-grid, up to time tfc, producing 
sample Uk- In the first method, we draw the sample U{s) = tki^k) +t,k{s — 
tk). Having done so we must be careful to draw Uk+i = '^h{Uk) +ik{h) \ Ck{s — 
tk). This is not complicated, because all random variables are Gaussian given 
available information, but needs to be respected via conditioning on the end 
points of the grid. An alternative is to draw the sample Uk+i = '^th{Uk) +Ck{h) 
first, followed by the sample U{s) — 'i's-tk{Uk) + ^k{s — tk) \ ik{h), which the 
conditional Gaussian structure facilitates. 

2.2. Strong convergence result 

To prove the strong convergence of our probabilistic numerical solver, we first 
need two assumptions quantifying properties of the random noise and of the 
underlying deterministic integrator, respectively. In what follows we use (•,•) 
and I • I to denote the Euclidean inner-product and norm on K”. We denote the 
Frobenius norm on by | • |f, and denotes expectation with respect to 

the i.i.d. sequence {xk}. 

Assumption 1. Let ^kit) := f Xk(s)ds with Xk dV{0, C^). Then there exists 
K > 0,p > 1 such that, for all t G [0, h], in par¬ 

ticular ¥f‘\^k{t)\'^ < Kt^P'^^. Furthermore we assume the existence of matrix Q, 
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independent of h, such that E^[f,k{h)£,k{h)'^] = Qhf^^^ . 

Here, and in the sequel, K \s a constant independent of h, but possibly 
changing from line to line. The covariance kernel is not uniquely defined, 
but the uncertainty at the beginning of the step must be zero and increase with 
step length, scaling exactly as at the end of the interval. Implementation 

requires selection of the constant matrix Q; in our numerical examples we assume 
a scaled identity matrix, Q = al, and we discuss one possible strategy for 
choosing a in Section 3.1. 

Assumption 2. The function f and a sufficient number of its derivatives are 
bounded uniformly in K" in order to ensure that f is globally Lipschitz and that 
the numerical flow-map 'I'/j has uniform local truncation error of order q -\- 1: 

sup !'!'((«) — ‘f’i(u)| < Kt‘^^^. 

uGR’* 

Remark 2.1. We assume globally Lipschitz f, and bounded derivatives, in order 
to highlight the key probabilistic ideas, whilst simplifying the numerical analysis. 
Future work will address the non-trivial issue of extending of analyses to weaken 
these assumptions. In this paper, we provide numerical results indicating that a 
weakening of the assumptions is indeed possible. 

Theorem 2.2. Under Assumptions 1,2 it follows that there is K > 0 such that 

sup - C/fcP < 

0<kh<T 

Furthermore 

sup E^\u{t) - U{t)\ < 

0<t<T 

This theorem implies that every probabilistic solution is a good approxima¬ 
tion of the exact solution in both a discrete and continuous sense. Choosing 
p > q is natural if we want to preserve the strong order of accuracy of the un¬ 
derlying deterministic integrator; we proceed with the choice p = q, introducing 
the maximum amount of noise consistent with this constraint. 

Proof. We first derive the convergence result on the grid, and then in continuous 
time. From (2.9) we have 


Uk+i = 'ith{Uk)+Uh) (2.14) 

whilst we know that 

Mfc+i = $/i(itfe). (2.15) 

Dehne the truncation error = '^h{Uk) — ^h{Uk) and note that 

Uk-\-l = ^h{Uk) + Cfc + f,k{hl). (2.16) 

Subtracting equation (2.16) from (2.15) and defining Ck = Uk — Uk, we get 
Cfe+i = ^hiuk) - ^h{uk - Ck) - Ck- ikih). 
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Taking the Euclidean norm and taking expectations gives, using Assumption 1 
and the independence of the ^k, 


E'^lcfc+ip 




^h{uk) 


^h{uk - ek) - e/c 


+ 0{h^P+^) 


where the constant in the term is uniform in k : 0 < kh < T. As¬ 
sumption 2 implies that ek = again uniformly in k : 0 < kh < T. 

Noting that <i)/i is globally Lipschitz with constant bounded by 1 + Lh under 
Assumption 2, we then obtain 


E^'lefc+ip < il + LhfE^\ek\^+E'^\{hi{^k{uk)-‘^h{uk-ek)),h-hk) 

+ C>(/i29+2) +C>(/l2P+l). 


Using Cauchy-Schwarz on the inner-product, and the fact that is Lipschitz 
with constant bounded independently of h, we get 

E'‘|efc+ip < (1 + 0{h))E^\ek\^ + C>(h^9+i) -t 0{h'^P+^). 

Application of the Gronwall inequality gives the desired result. 

Now we turn to continuous time. We note that, for s S \tk,tk+i)^ 

U{s) = '^s-tk [Uk) + ik{s — tk), 
u{s) = <^>s-tduk). 


Let Tt denote the cr-algebra of events generated by the {^k} up to time t. 
Subtracting we obtain, using Assumptions 1 and 2 and the fact that ^s-t^ has 
Lipschitz constant of the form 1 -|- 0(h), 

E^{\U{s) - < \^s-u{Uk) - + I'^s-tAUk) - <^s-tAUk)\ 

+ EWUs-tk)\\Tt,) 

< (1 + Lh)|efe| + + E'^lUs - 4)1 

< (1 + Lh)|efc| + + {E’^Ms - 4)n^ 

< (1 + Lh)|efc| + ^ 0(/,P+i), 

Now taking expectations we obtain 

E'‘|C/(s)-u(s)| < {1 + Lh){E^\ek\^)^ + 0{h‘‘+^) + 0{hP+i). 


Using the on-grid error bound gives the desired result, after noting that the 
constants appearing are uniform in 0 < kh < T. □ 
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2.3. Examples of probabilistic time-integrators 


The canonical illustration of a probabilistic time-integrator is the probabilistic 
Euler method already described. In this section we describe two methods which 
showcase the generality of the approach. 

The first is the classical Runge-Kutta method which dehnes a one-step nu¬ 
merical integrator as follows: 

'^h{u) =u-\-^ {ki(u) -\- 2 k 2 (u, h) + 2k^(u, h) + k^iu, h)) 

where 

ki{u) = f{u), k 2 {u, h) = f (u + ^hki{u)) 

kz{u, h) = f{u -f ^hk 2 {u)), ki{u, h) = f{u + hks{u)). 

The method has local truncation error in the form of Assumption 2 with <7 = 4. 
It may be used as the basis of a probabilistic numerical method (2.11), and hence 
(2.9) at the grid-points. Thus, provided that we choose to perturb this integrator 
with a random process Xk satisfying Assumption 1 with p > 4, Theorem 2.2 
shows that the error between the probabilistic integrator based on the classical 
Runge-Kutta method is, in the mean square sense, of the same order of accuracy 
as the deterministic classical Runge-Kutta integrator. 

The second is an integrated Ornstein-Uhlenbeck process, derived as follows. 
Define, on the interval s G [tk,tk+i), the pair of equations 

dU = Vdt, U{tk) = Uk, (2.18a) 

dV = -AVdt + V^dW, V{tk) = fiUk). (2.18b) 


Here IT is a standard Brownian motion and A and E are invertible matrices, 
possibly depending on h. The approximating function g^{s) is thus defined by 
T(s), an Ornstein-Uhlenbeck process. 

Integrating (2.18b) we obtain 

U(s) = exp(-A(s - tk))f{Uk) + Xk{s - 4), (2.19) 


where s G \tk,tk+i) and the {xk} form an i.i.d. sequence of Gaussian random 
functions defined on [0, h] with 


Xk{s) = V^ [ exp[A(T - s)) dW{r). 
Jo 


Note that the h-dependence of comes through the time-interval on which Xk 
is dehned, and through A and E. 

Integrating (2.18a), using (2.19), we obtain 


U{s) = Uk + A exp(-A(s - 4))^/({7fc) +Ckis-tk), 


( 2 . 20 ) 
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where s G [tk,tk+i], and, for t G [0, h], 



It. 


( 2 . 21 ) 


The numerical method (2.20) may be written in the form (2.11), and hence (2.9) 
at the grid-points, with the definition 



This integrator is first order accurate and satisfies Assumptions 2 with p = 1. 
Choosing to scale S with h so that q > 1 in Assumptions 1 leads to convergence 
of the numerical method with order 1. 

Had we carried out the above analysis in the case A = 0 we would have 
obtained the probabilistic Euler method (2.13), and hence (2.12) at grid points, 
used as our canonical example in the earlier developments. 

2.4- Backward error analysis 

The idea of backward error analysis is to identify a modified equation which is 
solved by the numerical method either exactly, or at least to a higher degree 
of accuracy than the numerical method solves the original equation. In the 
context of differential equations this modihed equation will involve the step-size 
h. In the setting of ordinary differential equations, and the random integrators 
introduced in this section, we will show that the modified equation is a stochastic 
differential equation (SDE) in which only the matrix Q from Assumption 1 
enters; the details of the random processes used in our construction do not enter 
the modified equation. This universality property underpins the methodology 
we introduce as it shows that many different choices of random processes all 
lead to the same effective behaviour of the numerical method. 

We introduce the operators C and defined so that, for all (j) G C'°°(IR"’, K), 


= (e''^ </>)(«), E^{Ui\Uo = u) = cf){u). (2.22) 

Thus C := f -V and is the kernel for the Markov chain generated by the 
probabilistic integrator (2.2). In fact we never need to work with itself in 
what follows, only with , so that questions involving the operator logarithm 


do not need to be discussed. 

We now introduce a modified ODE and a modified SDE which will be needed 
in the analysis that follows. The modihed ODE is 



(2.23) 


whilst the modihed SDE has the form 


du = f{u)dt + y^h^PQdW. 


( 2 . 24 ) 
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The precise choice of is detailed below. Letting E denote expectation with 
respect to W, we introduce the operators and so that, for all (j) G 

(j){u{h)\u{Q) = u) = E0(u(h)|M(O) = O) = </')(u). (2.25) 

Thus 

£'‘:=/^.V, = p-W+^h^PQ :VW. (2.26) 

where : denotes the inner product on which induces the Frobenius norm, 

that is, ^ : i? = trace(A^i?). 

The fact that the deterministic numerical integrator has uniform local trun¬ 
cation error of order g -I- 1 (Assumption 2) implies that, since </> G C°°, 

- ct>{^h{u)) = 0(h«+i). (2.27) 

The theory of modified equations for classical one-step numerical integration 
schemes for ODEs [12] establishes that it is possible to find in the form 

q+l 

/'*:=/ + ^h7z (2.28) 

i^q 

e'‘^V(w) - <(>(^h(M)) = 0(h«+2+'). (2.29) 

We work with this choice of in what follows. 

Now for our stochastic numerical method we have 

HUk+i) = H'i’hiUk))+^kih)-VH'i’hiUk)) 

Furthermore the last term has mean of size 0{\^k{h)\‘^). From Assumption 1 we 
know that E^ (^fc(^)^fc (^)) = ■ Thus 

e'‘^"</>(n) - <(.(^7^^)) = : VV(/.(^7n)) + 0{h^P+^). (2.30) 

From this it follows that 

- <(>(^/^(w)) = : VV(^(u) + 0(7 p+2). 

Finally we note that (2.26) implies that 

e^2V(«) - 

= : VV(/)(m) -b 

= (/ + 0(h)) : VV<(.(u) + 0{h^P+^)) 


(2.31) 
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Thus we have 

: W<p{u) + 0{h‘^P+^). (2.32) 

Now using (2.29), (2.31) and (2.32) we obtain 

</)(«) - = Oih^P+^) + 0(h9+2+'). (2.33) 

Balancing these terms, in what follows we make the choice ^ = 2p — g. If ^ < 0 
we adopt the convention that the drift is simply /. With this choice of q we 
obtain 

e^^''cj){u) - = 0{h^P+^). (2.34) 

This demonstrates that the error between the Markov kernel of one-step of 
the SDE (2.24) and the Markov kernel of the numerical method (2.2) is of order 
0{h‘^P'^'^). Some straightforward stability considerations show that the weak 
error over an 0(1) time-interval is 0{h'^P~^^). We make assumptions giving this 
stability and then state a theorem comparing the weak error with respect to the 
modified Equation (2.24), and the original Equation (2.1). 

Assumption 3. The function f is in C°° and all its derivatives are uniformly 
bounded on M". Furthermore f is such that the operators e^^ and e^^ satisfy, 
for all i/ € ^““(K",®) and some L > 0, 

sup \e^^'if>{u)\ < {1 + Lh) sup \'if>{u)\, 
uGR" uGR" 

sup tp{u)\ < {1 + Lh) sup \tft{u)\. 

Remark 2.3. Ifp = q in what follows (our recommended choice) then the weak 
order of the method coincides with the strong order; however, measured relative 
to the modified equation, the weak order is then one plus twice the strong order. 
In this case, the second part of Theorem 2.2 gives us the first weak order result 
in Theorem 2./. Additionally, Assumption 3 is stronger than we need, but allows 
us to highlight probabilistic ideas whilst keeping overly technical aspects of the 
numerical analysis to a minimum. More sophisticated, but structurally similar, 
analysis would be required for weaker assumptions on f. Similar considerations 
apply to the assumptions on (f. 

Theorem 2.4. Consider the numerical method (2.9) and assume that Assump¬ 
tions 1 and 3 are satisfied. Then, for (j) S C°° function with all derivatives 
bounded uniformly on K", we have that 

\(j){u{T)) - E^((^(17fe))| < ^ 

and 

|E(<^(w(r))) -E''(</>(17fc))| < 
where u and u solve (2.1) and (2.24) respectively. 


kh = T, 
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Proof. We prove the second bound first. Let Wk = E(^({t(tfe))|'u(0) = u) and 
Wk = E^((/)({7fc)|f7o = u). Then let Sk = sup„gR„ \ Wk — Wk\. It follows from the 
Markov property that 

Wk+i - Wk+i = e^^''Wk - 
= 

Using (2.34) and Assumption 3 we obtain 

4+1 < il + Lh)6k + Oih^P+^). 

Iterating and employing the Gronwall inequality gives the second error bound. 

Now we turn to the first error bound, comparing with the solution u of the 
original equation (2.1). From (2.31) and then (2.27) we see that 

- (l){^h{u)) = e>(/i^P+^), 

e'‘^(/.(ii) - = C>(/i™{2p+i,9+i})^ 

This gives the first weak error estimate, after using the stability estimate on 
from Assumption 3. □ 


Example 2.5. Consider the probabilistic integrator derived from the Euler 
method in dimension n = 1. We thus have q = 1, and we hence set p = 1. 
The results in [10] allow us to calculate f^ with I = 1. The preceding theory 
then leads to strong order of convergence 1, measured relative to the true ODE 
(2.1), and weak order 3 relative to the SDE 

du = (^f{u) - ^f{u)f{u) + ^ {f{u)f{u) + 4(/'(m))V(u))^ dt + VChdW. 

With these results now available, the following section provides an empirical 
study of our probabilistic integrators. 

3. Statistical Inference and Numerics 

This section explores applications of the randomised ODE solvers developed 
in Section 2. First, we study forward uncertainty propagation and propose 
a method for calibrating the problem-dependent scaling constant, cr, against 
classic error indicators.^ Secondly, we employ the calibrated measure within 
Bayesian inference problems, demonstrating that the resulting posteriors ex¬ 
hibit more consistent behaviour over varying step-sizes than with naive use 

^Recall that throughout we assume that, within the context of Assumption 1, Q = al. 
More generally it is possible to calibrate an arbitrary positive semi-definite Q. 



P. Conrad et al./Probabilistic Solutions of Differential Equations 


15 


of a deterministic integrators. Throughout this section we use the FitzHugh- 
Nagumo model to illustrate ideas [21]. This is a two-state non-linear oscillator, 
with states (V, R) and parameters (a, 6, c), governed by the equations 


dt 


= c 



dR 

dt 


1 

c 


{V-a + bR). 


(3.1) 


This particular example does not satisfy the stringent Assumptions 2 and 3 
and the numerical results shown demonstrate that, as indicated in Remarks 
2.1 and 2.3, our theory will extend to weaker assumptions on /, something we 
will address in future work. 




rxro 

h = 0.02 h = 0.01 h = 0.005 


r\/Xf 


h = 0.002 h = 0.001 h = 0.5 x 10" ^ 

rxnj 

) 5 10 15 20 

rxr^J 
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_ 1 _1_1_ 

0 5 10 15 2 


t t t 


Fig 2: The true trajectory of the V species of the FitzHugh-Nagumo model 
(red) and one hundred realisations from a probabilistic Euler ODE solver with 
various step-sizes and noise scale cr = .1 (blue). 


3.1. Calibrating forward uncertainty propagation 

Consider Equation (3.1) with fixed initial conditions E(0) = —l,i?(0) = 1, and 
parameter values (.2, .2, 3). Figure 2 shows draws of the V species trajectories 
from the measure associated with the probabilistic Euler solver with p = q — 
1, for various values of the step-size and fixed a = 0.1. The contraction of 
the measure towards the reference solution, as h shrinks, is clearly evident. 
Furthermore, the uncertainty exhibits interesting, non-Gaussian structure where 
trajectories disagree whether to begin the steep phase of the oscillation. 

Although the rate of contraction is governed by the underlying deterministic 
method, the scale parameter, cr, completely controls the apparent uncertainty in 
the solver. To illustrate this. Figure 3 fixes the step-size, h = 0.1, and shows that 
rescaling the noise can create any apparent level of certainty desired, including 
high confidence in an incorrect solution. This tuning problem exists in general, 
since cr is problem dependent and cannot obviously be computed analytically. 
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Fig 3: The true trajectory of the V species of the FitzHugh-Nagumo model 
(red) and one hundred random draws from a probabilistic Euler ODE solver 
with step-size h = .1 and various noise scalings (blue). Observe that for small a 
the draws concentrate away from the true solution. The value of tr = 0.2 appears 
appropriate, since the true value lies comfortably within the solution draws. 


Therefore, we propose to calibrate a to replicate the amount of error sug¬ 
gested by classical error indicators. In the following discussion, we often explic¬ 
itly denote the dependence on h and a with superscripts, hence the probabilistic 
solver is and the corresponding deterministic solver is 17^’°. Define the true 
error as e{t) = u{t) — Then we assume there is some computable er¬ 

ror indicator E{t) « e(t), defining Ek = E{tk). The simplest error indicators 
might compare differing step-sizes, E{t) = (t), or differing order 

methods, as in a Runge-Kutta 4-5 scheme. 

We proceed by constructing a probability distribution that is max¬ 

imised when the overall amount of error produced by our probabilistic inte¬ 
grator matches that suggested by the error indicator. We perform this scale 
matching by: (i) using a Gaussian approximation of our random solver at each 
step k, = A/’(E(17^’‘^), V(t/^’'^)); and (ii) by constructing a second Gaussian 
measure from the deterministic solver, and the available error indicator, 

Efe, — A/’(17^’°, {EkY). Thus (ii) is a natural Gaussian representation of the 
uncertainty based upon information from deterministic methods. We construct 
7r((T) by penalising the distance between these two normal distributions at every 

step: 7r(cr) oc Jlfc ^^p c?(/ij(’‘^, Z 2 ^)^. We find that the Bhattacharyya distance 

(closely related to the Hellinger metric) works well [16], since it diverges quickly 
if either the mean or variance differs between the two inputs. Computing TT{a) 
requires the mean and variance of the process , which can be estimated 
using Monte Carlo. If the ODE state is a vector, we take the product of the uni¬ 
variate Bhattacharyya distances. Note that this calibration depends not only on 
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the problem of interest, but also on the initial conditions and any parameters 
of the ODE. 

Returning to the FitzHugh-Nagumo model, we apply this strategy for choos¬ 
ing the noise scale. We sampled from 7r(cr) using pseudomarginal MCMC [1]. 
The sampled densities are shown in Figure 4; the similarity in the inferred a 
suggests that the asymptotic convergence rate is nearly exact in this case. Since 
this distribution is well peaked, MCMC mixes very quickly and is well repre¬ 
sented by the MAP value, hence we proceed using a* = argmax7r((T). Next, 
we examine the quality of the scale matching by plotting the magnitudes of the 
random variation against the error indicator in Figure 5, for several different 
step-sizes, observing good agreement of the marginal variances. Although the 
marginal variances are empirically matched to the error indicator, our measure 
is still reveals non-Gaussian structure and correlations in time not revealed by 
the deterministic analysis. As described, this procedure requires hxed inputs to 
the ODE, but it is straightforward to marginalise out a prior distribution over 
input parameters. 



ID fN 

odd 
I I I 


Fig 4: The sampled density of Tr{a) in the FitzHugh-Nagumo model, for several 
different step-sizes. 


3.2. Constructing Bayesian posterior inference problems 

Given the calibrated probabilistic ODE solvers described above, let us consider 
how to incorporate them into inference problems. 

Assume we are interested in inferring parameters of the ODE given noisy 
observations of the state. Specifically, we wish to infer parameters 0 G for 
the differential equation it = f(u,6), with fixed initial conditions u(t = 0) = Mq 
( a straightforward modification may include inference on initial conditions). 
Assume we are provided with data d G K™, dj = u{Tj) -\- rjj at some collection 
of times Tj, corrupted by i.i.d. noise, rjj ^ A/’(0,r). If we have prior Q(d), the 
posterior we wish to explore is, P(6* | d) oc Q{d)C{d,u{9)), where density C 
compactly summarises this likelihood model. 
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Fig 5: A comparison of the error indicator for the V species of the FitzHugh- 
Nagumo model (blue) and the observed variation in the calibrated probabilistic 
solver. The red curves depict fifty samples of the magnitude of the difference 
between a standard Euler solver for several step-sizes and the equivalent ran¬ 
domised variant, using a*, maximising 7r(cr). 


The standard computational strategy is to simply replace the unavailable 
trajectory u with a numerical approximation, inducing approximate posterior 
I d) oc Q(6*)£(d, U^’^{9)). Informally, this approximation will be accurate 
when the error in the numerical solver is small compared to F and often con¬ 
verges formally to P(0 | d) as h —>■ 0 [6]. However, the error in might be 
non-Gaussian and highly correlated, which could make it surprisingly easy for 
errors at finite h to have substantial impact. 

In this work, we are concerned about the undue optimism in the predicted 
variance; without a rigorous treatment of the solver uncertainty, the posterior 
can concentrate around an arbitrary parameter value even when the determinis¬ 
tic solver is inaccurate, and is merely able to reproduce the data by coincidence. 
The more conventional concern is that any error in the solver will be trans¬ 
ferred into posterior bias. Practitioners commonly alleviate both concerns by 
tuning solvers until they are nearly perfect, however, we note that this may be 
computationally prohibitive in many contemporary statistical applications. 

We can construct a different posterior that includes the uncertainty in the 
solver by taking an expectation over random solutions to the ODE 

P'*’'"(6i I d) oc Q(6i) J C{d, ^))d^. (3.2) 

Intuitively, this construction favours parameters that exhibit agreement with 
the entire family of uncertain trajectories, not just the one produced by the 
deterministic integrator. The typical effect of this expectation is to increase the 
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posterior uncertainty on 0, preventing the inappropriate posterior collapse we 
are concerned about. Indeed, if the integrator cannot resolve the underlying 
dynamics, will be large. Then is independent of 9, hence the 

prior is recovered, | d) « Q(d). 

Notice that as h —)■ 0, both the measures and P^’'^ collapse to the poste¬ 
rior using the analytic solution, P, hence both methods are correct. We do not 
expect the bias of P^’'^ to be improved, since all of the averaged trajectories are 
of the same quality as the deterministic solver in P^>o. We now construct an 
analytic inference problem demonstrating these behaviours. 

Example 3.1. Consider inferring the initial condition, uq G M, of the scalar 
linear differential equation, u = Xu, with A > 0. We apply a numerical method to 
produce the approximation Uk ~ u(kh). We observe the state at some times t = 
kh, with additive noise rjk ~ A/'(0, 7 ^).' dk = Uk + Pk- If we use a deterministic 
Euler solver, the model predicts Uk = {^ + hX)^UQ. These model predictions 
coincide with the slightly perturbed problem 

^ = h~^ log(l -I- Xh)u, 
dt 

hence error increases with time. However, the assumed observational model does 
not allow for this, as the observation variance is 7 ^ at all times. 

In contrast, our proposed probabilistic Euler solver predicts 

fc-i 

Uk = {I + hXfuo + + Xh)’^-^-\ 

j=o 


where we have made the natural choice p = q, where a is the problem dependent 
scaling of the noise and the fk are i.i.d. A/'(0,1). Eor a single observation, rjk 
and every fk are independent, so we may rearrange the equation to consider the 
perturbation as part of the observation operator. Hence, a single observation at 
k has effective variance 


ll ■=! 


2 


k— 1 

+ = 72 + a^h^ 

j=o 


(l-bA/l) 2 '=- 1 

{1 + xhy-i ■ 


Thus, late time observations are modelled as being increasingly inaccurate. 

Consider inferring uq, given a single observation dk at time k. If a Gaussian 
prior Af{mo, Co) is specified for uq, then the posterior is Af{m, C^), where 


r" = 


(1 -f /lA)2'= 


+ Co 


C = 


(1 + hX)^dk ._2 

- + Co "^0- 


ll 


The observation precision is scaled by (1 -|- hX)^^ because late time data con¬ 
tain increasing information. Assume that the data are dk = -I- 777 ^, for 

some given true initial condition Mq o.Pd noise realisation 77 b Consider now the 
asymptotic regime where h is fixed and fc —>■ 00 . For the standard Euler method. 
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where jh = 1, we see that 0, whilst m x ((1 + hX)~^e^^)^UQ. Thus the 

inference scheme becomes increasingly certain of the wrong answer: the variance 
tends to zero and the mean tends to infinity. 

In contrast, with a randomised integrator, the fixed h, large k asymptotics are 


1 


Co"^ + A(2 + \h)a-^h- 


m : 


{{l + hX)-^e 


hX 


sk I 
) ^^0 


l + Co” V2/i2A-i(2 + A/i)- 


Thus, the mean blows up at a modified rate, but the variance remains positive. 

We take an empirical Bayes approach to choosing a, that is, using a con¬ 
stant, fixed value a* = argmax7r(cr), chosen before the data is observed. Joint 
inference of the parameters and the noise scale suffer from well-known MCMC 
mixing issues in Bayesian hierarchic models. 

We now return to the FitzHugh-Nagumo model; given fixed initial condi¬ 
tions, we attempt to recover parameters 9 = (a, b, c) from observations of both 
species at times t = 1,2,..., 40. The priors are log-normal, centred on the true 
value with unit variance, and with observational noise F = 0.001. The data is 
generated from a high quality solution, and we perform inference using Euler 
integrators with various step-sizes, h G {0.005,0.01,0.02,0.05,0.1}, spanning a 
range of accurate and inaccurate integrators. 

We first perform the inferences with naive use of deterministic Euler inte¬ 
grators. We simulate from each posterior using delayed rejection MCMC [9] for 
100,000 steps, discarding the first 10% as burn in. The posteriors are shown in 
Figure 6. Observe the undesirable concentration of every posterior, even those 
with poor solvers; the posteriors are almost mutually singular, hence clearly the 
posterior widths are meaningless. 

Secondly, we repeat the experiment using our probabilistic Euler integrators, 
with results shown in Figure 7. We use a noisy pseudomarginal MCMC method, 
whose fast mixing is helpful for these initial experiments [20]. These posteriors 
are significantly improved, exhibiting greater mutual agreement and obvious 
increasing concentration with improving solver quality. The posteriors are not 
perfectly nested, possible evidence that our choice of scale parameter is imper¬ 
fect, or that the assumption of locally Gaussian error deteriorates. As expected, 
the bias of these posteriors is essentially unchanged. 


4. Probabilistic Solvers for Partial Differential Equations 

We now turn to present a framework for probabilistic solutions to partial dif¬ 
ferential equations, working within the finite element setting. Our approach to 
PDE problems is analogous to that for ODEs, except that now we randomly 
perturb the finite element basis functions. The assumptions required, and the 
strong convergence results, closely resemble their ODE counterparts. 
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9i 



Fig 6: The posterior marginals of the FitzHugh-Nagumo inference problem using 
deterministic integrators with various step-sizes. 



Fig 7: The posterior marginals of the FitzHugh-Nagumo inference problem using 
probabilistic integrators with various step-sizes. 
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4 . 1 . Probabilistic finite element method for variational problems 

Let V be a Hilbert space of real valued functions defined on a bounded polygonal 
domain D C Consider a weak formulation of a linear PDE specified via a 
symmetric bilinear form a : V x V —K, and a linear form r : V —> K to give 
the problem of finding u GV : a{u,v) = r(v), Vv G V. This problem can be 
approximated by specifying a finite dimensional subspace C V and seeking a 
solution in instead. This leads to a finite dimensional problem to be solved 
for the approximation U: 

U GV’^ : a{U, v) = r{v), Vv G V^. (4.1) 

This is known as the Galerkin method. 

We will work in the setting of finite element methods, assuming that = 
spa.n{(j)j}j^^ where (j)j is locally supported on a grid of points The 

parameter h is introduced to measure the diameter of the finite elements. We 
will also assume that 

^j{xk] — (^■^) 

Any element U G can then be written as 

,7 

U{x) = Y,U,cjt,{x) (4.3) 

1=1 

from which it follows that U{xk) = Uk- The Galerkin method then gives the 
following equation for U = {Ui, ... ,Uj)^ : 

AD = r (4.4) 

where Ajk = a{(j)j,4)k) and Vk = r{4>k)- 

In order to account for uncertainty introduced by the numerical method, we 
will assume that each basis function (pj can be split into the sum of a systematic 
part (jfj and random part 0^, where both pj and 0^- satisfy the nodal property 
(4.2), hence pjixk) = 0. Furthermore we assume that each 0^- shares the same 
compact support as the corresponding 0®, preserving the sparsity structure of 
the underlying deterministic method. 


4-2. Strong convergence result 

As in the ODE case, we begin our convergence analysis with assumptions con¬ 
straining the random perturbations and the underlying deterministic approx¬ 
imation. The bilinear form a(-, •) is assumed to induce an inner-product, and 
then norm via |j • ||^ = a(-, •); furthermore we assume that this norm is equiva¬ 
lent to the norm on V. Throughout denotes expectation with respect to the 
random basis functions. 
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Assumption 4. The collection of random basis functions {4>'j}j^i are inde¬ 
pendent, zero-mean, Gaussian random fields, each of which satisfies 4>j(xk) = 0 
and shares the same support as the corresponding systematic basis function (jfj. 
For all j, the number of basis functions with index k which share the support 
of the basis functions with index j is bounded independently of J, the total 
number of basis functions. Furthermore the basis functions are scaled so that 

Assumption 5. The true solution u of problem (4.1) is in L°°{D). Furthermore 
the standard deterministic interpolant of the true solution, defined by := 
J2j=i ’^{^3)4’)^ satisfies ||m - v^\\a < Ch‘‘. 

Theorem 4.1. Under Assumptions 4 O'^d 5 it follows that the approximation 
U, given by (4.1), satisfies 

E^\\u-U\\l < c;/i2™{p.9}_ 

As for ODEs, the solver accuracy is limited by either the amount of noise 
injected or the convergence rate of the underlying deterministic method; the 
choice p = q is natural since it introduces the maximum amount of noise that 
does not disturb the deterministic rate of convergence. 

Proof. Recall the Galerkin orthogonality property which follows from subtract¬ 
ing the approximate variational principle from the true variational principle: it 
states that, for e = m — C/, 

a(e,u)=0, (4.5) 

From this it follows that 

||e||a < ||m - ■c||a, VvGV'". (4.6) 

To see this note that, for any v S V^, the orthogonality property (4.5) gives 

a(e, e) = a{e,e-\-U — v) = a{e,u — v). (4-7) 

Thus, by Cauchy-Schwarz, ||e||^ < ||e||a||w — u||q, \/v G implying (4.6). We 

now set, for v G V, 

J .J .] 

= '^<X3)4>3{x) =^u{xj)4>]{x)+^u{xj)4>]{x) 

i=i i=i i=i 

=: n^(a;)-I-n''(a;). 

By the mean-zero and independence properties of the random basis functions 
we deduce that 

E^\\u-v\\l = E'^a{u-v,u-v)=E^a{u-v^,u-v^)+E'^a{v/F) 

J 

The result follows from Assumptions 4 and 5. □ 
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4 . 3 . Poisson solver in two dimensions 

Consider a Poisson equation with Dirichlet boundary conditions in dimension 
d = 2, namely 


—Am = /, X € D, 
u = 0, X £ dD. 

We set V = Hq{D) and H to be the space L‘^{D) with inner-product (•, •) and 
resulting norm 1-1^ = (•, •)• The weak formulation of the problem has the form 
(4.1) with 

a(u,v) = / Vu(x)Vv(x)dx, r(v) = {f,v}. 

Jd 

Now consider piecewise linear hnite elements satisfying the assumptions of Sec¬ 
tion 4.2 in [15] and take these to comprise the set Then h measures 

the width of the triangulation of the finite element mesh. Assuming that f £ H 
it follows that u £ H^{D) and that 

< C'/l||u||// 2 . (4.8) 

Thus q = 1. We choose random basis members so that Assumptions 4 

hold with p = 1. Theorem 4.1 then shows that, ior e = u — U, E^||e||^ < CK^. 
Note, however, that in the deterministic case we expect an improved rate of 
convergence in the function space H. We show that such a result also holds in 
our random setting. 

Note that, under Assumptions 4, if we introduce 7 ^^ that is 1 when two basis 
functions have overlapping support, and 0 otherwise, then is symmetric 
and there is constant C, independent of j and J, such that Ijk — 

Now let ip solve the equation a((p,v) = (e,v}, Vv £ V. Then < Clel. 

We dehne (p^ and p'' in analogy with the definitions of and v/ Following 
the usual arguments for application of the Aubin-Nitsche trick [15], we have 
|ep = a(e, p) = o(e, p — p^ — p''). Thus 

|ep < llelUII^ - - /lU < V2\\e\\u{\\p - + ||^^||^) '■ (4.9) 

We note that p'{x) = where, by Sobolev 

embedding (d = 2 here), aj := p{xj)/\\ p\\h '2 satisfies maxi<j<j \aj \ < C. Note, 
however, that the aj are random and correlated with all of the random basis 
functions. Using this, together with (4.8), in (4.9), we obtain 

|ep < C||e||a(/i^ + ||^aj^;-(x)||^)"||^||// 2 . 
i=i 

We see that 

.7 .7 1 

|e| < C\\e\\a(^h^ + ■ 

j=i k=i 
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From this and the symmetry of 'yjk, we obtain 

j=l k=l 

<C\\e\u{h^ + 2C^UXy- 

i=i 

Taking expectations, using that p = q = 1, we find, using Assumptions 4, that 

E^|e| < (^^(E^llell^) ^ < Ch^ as desired. Thus we recover the extra order of 
convergence over the rate 1 in the || • ||o norm (although the improved rate is in 
whilst the lower rate of convergence is in L^{n-,V).) 

5. PDE Inference and Numerics 

We now perform numerical experiments using probabilistic solvers for elliptic 
PDFs. Specifically, we perform inference in a ID elliptic PDE, V• {k{x)Wu{x)) = 
ix for X S [0,1], given boundary conditions u(0) = 0,u(l) = 2. We represent 
logK as piecewise constant over ten equal-sized intervals; the first, on cc S [0, .1) 
is fixed to be one to avoid non-identifiability issues, and the other nine are 
given a prior 9i = log^i ~ A/'(0,1). Observations of the field u are provided 
at cc = (0.1,0.2,... 0.9), with i.i.d. Gaussian error, A/'(0,10“®); the simulated 
observations were generated using a fine grid and quadratic finite elements, 
then perturbed with error from this distribution. 

Again we investigate the posterior produced at various grid sizes, using both 
deterministic and randomised solvers. The randomised basis functions are draws 
from a Brownian bridge conditioned to be zero at the nodal points, implemented 
in practice with a truncated Karhunen-Loeve expansion. The covariance oper¬ 
ator may be viewed as a fractional Laplacian, as discussed in [19]. The scaling 
cr is again determined by maximising the distribution described in Section 3.1, 
where the error indicator compares linear to quadratic basis functions, and we 
marginalise out the prior over the Ki values. 

The posteriors are depicted in Figures 8 and 9. As in the ODE examples, 
the deterministic solvers lead to incompatible posteriors for varying grid sizes. 
In contrast, the randomised solvers suggest increasing confidence as the grid 
is refined, as desired. The coarsest grid size uses an obviously inadequate ten 
elements, but this is only apparent in the randomised posterior. 

6. Conclusions 

We have presented theory and methods for probabilistic approaches to the nu¬ 
merical solution of both ordinary and partial differential equations. These meth¬ 
ods give rise to a probability measure whose qualitative and quantitative prop¬ 
erties account for and quantify the uncertainty induced by finite dimensional 


P. Conrad et al./Probabilistic Solutions of Differential Equations 


26 


9i 


nl /\ 

02 


— h=l/10 

- h=l/20 

- h=l/40 

— h=i/no 
- h=l/80 

03 

04 

\ 

. 1 


* ^ 


II 

i 

% 


. ^ 


COCS — C;-«TCSC3T« C OJ fC'l C OJ 


Fig 8: The marginal posterior distributions for the first four coefficients in ID 
elliptic inverse problem using a classic deterministic solver with various grid 
sizes. 
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Fig 9: The marginal posterior distributions for the first four coefficients in ID 
elliptic inverse problem using a randomised solver with various grid sizes. 
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approximation. We provide a theoretical analysis of the properties of these prob¬ 
abilistic integrators and demonstrate that they induce more coherent inferences 
in illustrative empirical examples. This work opens the door for statistical analy¬ 
sis that explicitly incorporates numerical uncertainty in many important classes 
of contemporary statistical problems across the sciences, including engineering, 
climatology, and biology. 

Drawing parallels to model error [18], we can consider our intrusive modifica¬ 
tions to be a highly-specialised discrepancy model, designed using our intimate 
knowledge of the structure and properties of numerical methods. Further study 
is required to compare our approach to existing methods, and we hope to develop 
other settings where modifying the internal structure of numerical methods can 
produce computationally and analytically tractable measures of uncertainty. De¬ 
veloping robust, practical solvers, and efficiently performing computations with 
the additional uncertainty we propose remains a challenge. Finally, future work 
may be able to exploit our coherent uncertainty propagation to optimise the 
tradeoff between solver cost and statistical variance. 


References 

[1] Andrieu, C. and Roberts, G. O. (2009). The pseudo-marginal approach 
for efficient Monte Carlo computations. The Annals of Statistics 697-725. 

[2] Arnold, A., Calvetti, D. and Somersalo, E. (20f3). Linear multistep 
methods, particle filtering and sequential Monte Carlo. Inverse Problems 
29 085007. 

[3] Capistran, M., Christen, J. A. and Donnet, S. (2013). Bayesian Anal¬ 
ysis of ode’s: solver optimal accuracy and Bayes factors. arXiv preprint 
arXiv:1311.2281. 

[4] Chkrebtii, O. a., Campbell, D. A., Girolami, M. A. and Calder- 
HEAD, B. (2013). Bayesian uncertainty quantification for differential equa¬ 
tions. arXiv preprint arXiv:1306.2365. 

[5] COULIBALY, I. and Lecot, C. (1999). A quasi-randomized Runge-Kutta 
method. Mathematics of Computation of the American Mathematical So¬ 
ciety 68 651-659. 

[6] Dashti, M. and Stuart, A. M. (2016). The Bayesian Approach to In¬ 
verse Problems. In Handbook of Uncertainty Quantification (R. Ghanem, 
D. Higdon and H. Owhadi, eds.) Springer. 

[7] Diaconis, P. (1988). Bayesian numerical analysis. Statistical Decision 
Theory and Related Topics IV 1 163-175. 

[8] Eriksson, K. (1996). Computational Differential Equations v. 1. Cam¬ 
bridge University Press. 

[9] Haario, H., Laine, M., Mira, A. and Saksman, E. (2006). DRAM: 
efficient adaptive MCMC. Statistics and Computing 16 339-354. 

[10] Hairer, E., Lubich, C. and Wanner, G. (2006). Geometric numerical 
integration: structure-preserving algorithms for ordinary differential equa¬ 
tions 31. Springer Science & Business Media. 


P. Conrad et al./Probabilistic Solutions of Differential Equations 


28 


[11] Hairer, E., McLachlan, R. I. and Razakarivony, A. (2008). Achiev¬ 
ing Brouwer?s law with implicit Runge-Kutta methods. BIT Numerical 
Mathematics 48 231-243. 

[12] Hairer, E., N0rsett, S. P. and Wanner, G. (1993). Solving Ordinary 
Differential Equations I: Nonstiff Problems. Solving Ordinary Differential 
Equations. Springer. 

[13] Hennig, P. and Hauberg, S. (2014). Probabilistic Solutions to Differen¬ 
tial Equations and their Application to Riemannian Statistics. In Proc. of 
the nth int. Conf. on Artificial Intelligence and Statistics (AISTATS) 33. 

[14] Hennig, P., Osborne, M. A. and Girolami, M. (2015). Probabilistic 
Numerics and Uncertainty in Computations. Proceedings of the Royal So¬ 
ciety A in press. 

[15] Johnson, C. (2012). Numerical Solution of Partial Differential Equations 
by the Finite Element Method. Dover Books on Mathematics Series. Dover 
Publications, Incorporated. 

[16] Kailath, T. (1967). The Divergence and Bhattacharyya Distance Mea¬ 
sures in Signal Selection. Communication Technology, IEEE Transactions 
on 15 52-60. 

[17] Kaipio, J. and Somersalo, E. (2007). Statistical inverse problems: dis¬ 
cretization, model reduction and inverse crimes. Journal of Computational 
and Applied Mathematics 198 493-504. 

[18] Kennedy, M. C. and O’Hagan, A. (2001). Bayesian calibration of com¬ 
puter models. Journal of the Royal Statistical Society: Series B (Statistical 
Methodology) 63 425-464. 

[19] Lindgren, F., Rue, H. and Lindstrom, J. (2011). An explicit link be¬ 
tween Gaussian fields and Gaussian Markov random fields: the stochastic 
partial differential equation approach. Journal of the Royal Statistical So¬ 
ciety: Series B (Statistical Methodology) 73 423-498. 

[20] Medina-Aguayo, F. J., Lee, A. and Roberts, G. O. (2015). Stability 
of Noisy Metropolis-Hastings. ArXiv e-prints. 

[21] Ramsay, J. O., Hooker, G., Campbell, D. and Gao, J. (2007). Param¬ 
eter estimation for differential equations: a generalized smoothing approach. 
Journal of the Royal Statistical Society: Series B (Statistical Methodology) 
69 741-796. 

[22] SCHOBER, M., Duvenaud, D. K. and Hennig, P. (2014). Probabilistic 
ODE solvers with Runge-Kutta means. In Advances in Neural Information 
Processing Systems 739-747. 

[23] Skilling, J. (1992). Bayesian solution of ordinary differential equations. 
In Maximum Entropy and Bayesian Methods 23-37. Springer. 

[24] Stengle, G. (1995). Error analysis of a randomized numerical method. 
Numerische Mathematik 70 119-128. 

[25] Sullivan, T. (2016). Uncertainty Quantification. Springer. 



